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Neural network approach to parton distributions fitting 

The NNPDF Collaboration: Andrea Piccione a , Luigi Del Debbio b , Stefano Forte c , Jose I. Latorre d and 
Joan Rojo d . 

a Dipartimento di Fisica Teorica, Universita di Torino, and INFN, Sezione di Torino, 
Via P. Giuria 1, 1-10125, Italy 

b Theory Division, CERN, 
CH-1211 Geneve 23, Switzerland 

c Dipartimento di Fisica, Universita di Milano and INFN, Sezione di Milano, 
Via Celoria 16, 1-20133, Italy 

d Departament d'Estructura i Constituents de la Materia, Universitat de Barcelona, 
Diagonal 647, E-08028 Barcelona, SPAIN 

We will show an application of neural networks to extract information on the structure of hadrons. A Monte 
Carlo over experimental data is performed to correctly reproduce data errors and correlations. A neural network 
is then trained on each Monte Carlo replica via a genetic algorithm. Results on the proton and deuteron structure 
functions, and on the nonsinglet parton distribution will be shown. 



1. Introduction 

The requirements of precision physics at 
hadron colliders have recently led to a rapid im- 
provement in the techniques for the determina- 
tion of the structure of the nucleon. Playing this 
game factorization is a crucial issue. Indeed, it 
ensures that we can extract the parton structure 
of the nucleon from a process with only one initial 
proton (say, Deep Inelastic Scattering at HERA) , 
and then we can use this as an input for a process 
where two initial protons are involved (Drell-Yan 
at LHC). In the QCD improved parton model the 
DIS structure function of the nucleon can be writ- 
ten as 



F 2 (x, Q 2 



where Q 2 
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quarks and the gluon distributions that describe 
the non pertubative dynamics, the so called Par- 
ton Distribution Functions (PDFs). 

The extraction of a PDF from experimental 
data is not trivial, even if it is a well establ- 
ished task. In order to do that we have to 
evolve the PDFs to the scale of data, perform 
the x-convolution, add theoretical uncertainties 
(resummation, nuclear corrections, higher twist, 
heavy quark thresholds, . . .), and then deconvo- 
lute in order to have a function of a; at a common 
scale Q 2 . 

Recently it has been pointed out that the un- 
certainty associated with a PDFs set is crucial 
|1I2I3| . The uncertainty on a PDF is given by the 
probability density V [/] in the space of functions 
f(x), that is the measure we use to perform the 
functional integral that gives us the expectation 
value 



-q" = — (k — fc'J", x = Q 2 /2p-q, and 
p, k and k' are the momenta of the initial nucleon, 
the incoming lepton, and the scattered lepton re- 
spectively; C 1 are the coefficient functions per- 
tubatively calculable, q q (x,Q 2 ) and g(x,Q 2 ) the 



[/(*)]}= VfF[f(x)]V[f(x)], 



(2) 



where J-[f] is an arbitrary function of f(x). 
Thus, when we extract a PDF we want to deter- 
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mine an infinite-dimensional object (a function) 
from finite set of data points, and this is a math- 
ematically ill-posed problem. 

The standard approach is to choose a sim- 
ple functional form with enough free parame- 
ters (q(x,Ql) = x a (l - x) p P{x)), and to fit 
parameters by minimizing y 2 . Some difficulties 
arise: errors and correlations of parameters re- 
quire at least fully correlated analysis of data 
errors; error propagation to observables is dif- 
ficult: many observables are nonlinear/nonlocal 
functional of parameters; theoretical bias due to 
choice of parametrization is difficult to assess (ef- 
fects can be large if data are not precise or hardly 
compatible). 

Here we present an alternative approach to this 
problem. First we will show our technique applied 
to the determination of the Structure Functions. 
This is the easiest case, since no evolution is re- 
quired, but only data fitting, thus it is a good 
application to test the technique. Then, we will 
show how this approach can be extended for the 
determination of the PDFs. 

2. Structure functions 

The strategy presented in j 1I5| to address the 
problem of parametrizing deep inelastic structure 
functions F(x,Q 2 ) is a combination of two tech- 
niques: a Monte Carlo sampling of the exper- 
imental data and a neural network training on 
each data replica. 

The Monte Carlo sampling of experimental 
data is performed generating N Iep replicas of the 
original N^at experimental data, 



F 
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where i = 1, . . . , A^at, t are gaussian random 
numbers with the same correlation as the respec- 
tive uncertainties, and (j stat , a sys , ctn are the sta- 
tistical, systematic and normalization errors. The 
number of replicas N lep has to be large enough so 
that the replica sample reproduces central values, 
errors and correlations of the experimental data. 



The second step is to train a neural network 
on each data replica. A neural network [fy is 
a highly nonlinear mapping between input and 
output patterns as a function of its parameters. 
We choose an architecture with 4 inputs (x, logx, 
Q 2 , logQ 2 ), two hidden layers with 5 and 3 neu- 
rons respectively, and one output, F(x,Q 2 ). The 
training on each replica is performed in two steps. 
First, we use the Back Propagation technique to 
minimize 

, Wdat f F (art)(fc) _ F (nct)(fe)\ 2 

2( k) = _L \i i L . f 4 ) 
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then, we use the Genetic Algorithm 7 j to mini- 
mize 



,2 (ft) = 



dat .4-!. V 
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(art)(fc) p(net)(k) 



(5) 



The Back Propagation technique allows for a fast 
minimization, but it always oscillates, while the 
Genetic Algorithm is always decreasing, and it 
is more suitable for the last part of the training 
where the stability of the x 2 is needed, see Fig.^ 
Once we have trained all the neural networks on 
data replicas, we have a probability density in the 
space of structure functions, V [F(x, Q 2 )] , which 
contains all information from experimental data, 
including correlations. Expectation values over 
this probability measure are then evaluated as av- 
erages over the trained network sample, 



E ^ T ^ (j p(nct)( fe)(2 , Q 2)) 



(T[F(x,Q 2 )]) 



(6) 



In Fig. [21 we show our results 1 for the deuteron 
structure function F${x, Q 2 ) 0], and for the pro- 
ton structure function F^XjQ 2 ) |Sj compared to 
a polynomial parametrization [8]. We observe 
that in the data range the two fits agree within 
errors. In the extrapolation region the error band 
of the polynomial fit has the same narrow width 
as in the data range, while the error band of the 

1 The source code, driver program and graphical web 
interface for our structure function fits is available at 
http://sophia.ecm.ub.es/f2neural 
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Figure 1. Dependence of the x 2 on the length 
of training: (big pad) total training (small pad), 
detail of the GA training. 



neural networks grows indicating that we are in 
a region where the error is underterminate since 
there are no data. 

Neural networks turn to be a suitable tool also 
in the presence of uncompatible data. Indeed, 
once a good fit is obtained, say a stable value of 
X 2 ~ 1, the neural networks infer a natural law 
by following the regularity of data, and uncom- 
patible data are discarded without any hypotesis 
on the shape of the parametrization (see Fig. |3J, . 
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Figure 2. Neural networks fit (NNPDF) com- 
pared to a polynomial fit (SMC) for the deuteron 
and the proton structure function. 



3. Parton distributions 

The strategy presented in the above section 
can be used to parametrize parton distributions 
as well, provided one now takes into account 
Altarclli-Parisi QCD evolution. 

Now neural networks are used to parametrize 
the PDF at a reference scale. We choose an archi- 
tecture with 2 inputs (x, logx), two hidden layers 
with 2 neurons each, and one output, q(x, Qq). 
The training on each replica is performed only 
with the Genetic Algorithm, since we have a non 
local function to be minimized (see eqs. nand[5J). 

Once the fit is done, the expectation value and 



the error of an arbitrary function T of a PDF, 
or the correlation between different PDFs can be 
computed in the following way: 



£fe=7^(9 (net)(fe) (z)) 



N r , 



<^[ ? (x)] = ^(T[q(x)} 2 )-(T[q(x)}) 2 (7) 



(u(xi)d(x 2 )) = 



As a first application of our method, we extract 
the nonsmglct parton distribution Qns(.%j Qfy — 
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Figure 3. Fixed target proton data and predic- 
tions for an x-bin. 



Figure 4. Comparison of the prediction for 
F 2 JVS (x,Q 2 ) with different PDF sets. 



i (u + u — d — dj (x, Qq) from the nonsinglet 
structure function F^ s (x,Q 2 ) measured by the 
NMC and BCDMS [TUITT] collaborations. The 
very preliminary results of a NLO fit with fully 
correlated uncertainties can be seen in Fig. 0] 
(only 25 replicas are used instead of 1000). The 
initial evolution scale is Qq = 2GeV 2 , and the 
kincmatical cuts in order to avoid higher twist 
effects are Q 2 = 3GeV 2 and W 2 = 6.25 GeV 2 . 
Our result is consistent within the error bands 
with the results from other global fits |12I13| . but 
in the small- a; range where data are poor, differ- 
ences become more sizeable. This effect will be 
further investigated, however, a larger number of 
data in the small- x range for the deuteron will 
help in cleaning this picture. 

Summarizing, we have described a general tech- 
nique to parametrize experimental data in an 
bias-free way with a faithful estimation of their 
uncertainties, which has been successfully applied 
to structure functions and that now is being im- 
plemented in the context of global parton distri- 
bution fits. 
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